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Classical rate theories fail in cases where the observable(s) or order parameter(s) used are poor 
reaction coordinates and no clear separation between reactants and products is possible. Here, we 
present a general rate theory for ergodic dynamical systems in thermal equilibrium that explicitly 
takes into account how the system is observed. The theory allows the systematic estimation errors 
made by standard rate theories to be understood and quantified. We also elucidate the connection 
of spectral rate theory with the popular Markov state modeling (MSM) approach for molecular 
simulation studies. An optimal rate estimator is formulated that gives robust and unbiased results 
even for poor reaction coordinates and can be applied to both computer simulations and single- 
molecule experiments. No definition of a dividing surface is required, and a measure of the reaction 
coordinate quality (RCQ) becomes readily available. Additionally, the respective projected prob- 
ability distributions can be obtained for the reactant and product states along the observed order 
parameter, even when these strongly overlap. The approach is demonstrated on numerical examples 
and experimental single-molecule force probe data, here focusing on the case of two-state kinetics. 

PACS numbers: 
Keywords: 



The description of complex molecular motion through 
simple kinetic rate theories has been a central concern of 
statistical physics. A common approach, first-order rate 
theory, treats the relaxation kinetics among distinct re- 
gions of configuration space by single-exponential relax- 
ation. There has been a recent interest in estimating such 
rates from fluctuation trajectories of single molecules, re- 
sulting from the recent maturation of measurement tech- 
niques that are able to collect extensive traces of single 
molecule extensions or fluorescence [121 ITS] , When the 
available observable or order parameter is a good reaction 
coordinate that allows the slowly-converting states to be 
clearly separated (see Fig [I] top left), classical rate theo- 
ries apply and the robust estimation of transition rates is 
straightforward using a variety of means [32 . However, 
in the common case in which the reaction coordinate is 
poor and the separation of the slowly-converting states 
is not obvious, a satisfactory theoretical description is 
missing and many estimators break down. 

Most rate theories and estimators are based upon di- 
viding the observed state space into a reactant and a 
product substate and then in some way counting tran- 
sition events that cross the dividing surface. Transition 
state theory measures the instantaneous flux across this 
surface, which is known to overestimate the rate due to 
the counting of unproductive recrossings over the divid- 
ing surface on short timescales [TU] . Reactive flux theory 
[3] has proposed to cope with this by counting a transi- 
tion event only if it has succeeded to stay on the product 
side after a sufficiently long lag time r. Reactive flux 
theory involves derivatives of autocorrelation functions 
that are numerically unreliable to evaluate [TJ. In prac- 
tice, one therefore typically estimates the relaxation rate 
via an exponential fit to the tail of a suitable correlation 
function, such as the number correlation function of re- 
actants or the autocorrelation function of the experimen- 



tally measured signal [9J |29j [32]. In order to transform 
this relaxation rate into a forward and backward rate 
constant, a clear definition of reactant and product is 
needed, requiring that they are separable in the available 
observable. 

Markov models (MSMs) have recently become a popu- 
lar approach to producing a simplified statistical model of 
complex molecular dynamics from molecular simulations 
and can be regarded as an attempt towards a multistate 
rate theory. MSMs use a transition matrix describing 
the probability a system initially found in a substate i is 
found in substate j a lag time r later. When the state 
division allows the metastable states of the system to be 
distinguished jH |5J [20] 1M]> the transition matrix with 
a sufficiently large choice of r can be used to derive a 
phenomenological transition rate matrix that accurately 
describes the interstate dynamics [3(5] . This is explicitly 
done for the two-state case in (7j. It has been shown in 
|24l |2"5] that by increasing the number of substates used 
to partition state space, and hence using multiple divid- 
ing surfaces instead of a single one, these rate estimates 
become more precise. In the limit of infinitely many sub- 
states, the eigenfunctions of the dynamical propagator 
in full phase space are exactly recovered, and the rate 
estimates become exact even for r — >• + [14J. In prac- 
tice, however, a finite choice of r is necessary in order 
to have a small systematic estimation error, especially if 
"'uninteresting" degrees of freedom such as momenta or 
solvent coordinates are discarded. An alternative way of 
estimating transition rates is by using a state definition 
that is incomplete and treats the transition region im- 
plicitly via committor functions that may better approx- 
imate the eigenfunctions of the dynamical propagator in 
this region [2| [28] . 

The quality of the rate estimates in all of the above 
approaches relies on the ability to separate the slowly- 
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converting states in terms of some dividing surface or 
state definition. The above approaches break down in 
the case where the available observables or order param- 
eter^) do not permit such a separation, i.e. when kinet- 
ically distinct states overlap in the histogram of the ob- 
served order parameter. However, such a scenario may of- 
ten arise in single-molecule experiments where the avail- 
able order parameter depends on what is experimentally 
observable and may not necessarily be a good reaction co- 
ordinate. In principle, Hidden Markov Models (HMMs) 
are able to estimate transition rates even in such situ- 
ations [13 [TTJ [THl US]. However, HMMs need a proba- 
bility model of the measurement process to be defined, 
they are relatively complex to handle and implement, and 
they represent a machine learning approach rather than 
a physical theory. 

Here, we present a general rate theory for stochas- 
tic dynamics that are observed on a possibly poor re- 
action coordinate. It is assumed that the dynamical law 
governing evolution of the system is a time-stationary 
Markov process in full-dimensional phase space, obeys 
microscopic detailed balance, and supports a unique sta- 
tionary distribution — criteria satisfied for many physical 
systems of interest. These dynamics are then projected 
onto some observable in which they are no longer Marko- 
vian and may not easily be separable by a simple divid- 
ing surface. This framework allows us to (i) evaluate the 
quality of existing estimators and propose optimal esti- 
mators for the slowest relaxation rate, (ii) provide an ab- 
solute measure of reaction coordinate quality (RCQ), and 
(iii) derive an optimal estimator for the probabilities of 
and transition rates between the slowly converting states, 
even if these strongly overlap in the observation. 

The present rate theory is exclusively concerned with 
the systematic error in estimating rates and proposes "op- 
timal" methods that minimize this systematic rate esti- 
mation error. Therefore all statements arc strictly valid 
only in the data-rich regime. Explicit treatment of the 
statistical error in the data-poor regime is beyond the 
scope of the present work but discussed at the end of the 
paper. 

Full-space dynamics 

We consider a dynamical system that follows a station- 
ary and time-continuous Markov process x t in a gener- 
ally large and continuous state space fl. x t is assumed to 
be ergodic with a unique stationary density fx{x). In 
order to be independent of specific dynamical models 
we use the general transition density p T (x t , x t+T ), i.e. 
the probability density that given the system is at point 
Xt € fi at time t, it will be found at point x t+r G fi 
a lag time r later. We will at this point also assume 
that the dynamics obey microscopic detailed balance, i.e. 
(i(x t )p T (x t ,x t+T ) = /j,(x t+T )p T (x t+T ,x t ), which is true 



for systems that are not driven by external forces. In 
this case, /i(x) is a Boltzmann distribution in terms of 
the system's Hamiltonian. The following rate theory can 
also be formulated for some nonreversible dynamics, but 
the notation becomes more involved and the quantities 
are more difficult to interpret; we therefore restrict our 
discussion to the reversible case here. 

The transition density can be expanded into relaxation 
processes that are associated with different intrinsic rates 
by developing it in terms of the eigenvalues and eigen- 
functions of the corresponding transfer operator |2~4l |2"o] : 

oo 

p T (x t ,X t+T ) = ^e~ Kzr Tpi(x t )fl(x t+ r)lpi(X-t + r) (1) 
1=1 

We are interested in the slow processes. Here, 

Ai(r) = e- K * r (2) 

are eigenvalues of the propagator that decay exponen- 
tially with lag time r. We order relaxation rates accord- 
ing to Ki < K2 < K3 < ... and thus Ai(t) > Aa(r) > 
M(t) > The first term is special in that it is the 
only stationary process: K4 = 0, Ai(t) = 1, ipi(x) = 1, 
thus the first term of the sum is identical to fJ.(x). All 
other terms can be assigned a finite relaxation rate k,, 
or a corresponding relaxation timescale ti = kJ , which 
are quantities of our interest. The eigenfunctions ipi are 
independent of r and determine the structure of the re- 
laxation process occuring with rate Ki. The sign struc- 
ture of ipi(x) determines between which substates the 
corresponding relaxation process is switching and is thus 
useful for identifying metastable sets, i.e. sets of states 
that are long-living and interconvert only by rare events 
[241 |2"T] . The eigenfunctions are chosen to obey the nor- 
malization conditions 

(i>i,tpj) n = J dx V> 4 (x)?/>j( x )m(x) = 5ij. (3) 

At a given timescale r of interest, fast processes with 
k 3> t _1 (and correspondingly ti -C t) will have effec- 
tively vanished and we are typically left with relatively 
few slowly-relaxing processes. 

Finally, we define the correlation density c r (xt,x t+T ), 
i.e. the joint probability density of finding the system at 
x t at time t and at x t+T at time t + t: 

c T (x t ,x t+T ) = n(x t )p T (x t ,x t+T ) (4) 

Observed dynamics and rate theory 

Let us consider the case that we are only interested in a 
single relaxation process - the slowest. Below, we sketch 
a rate theory for this case. Details of the derivation can 
be found in the Supplementary Information. Based on 
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the definitions above, the correlation density can then be 
written as: 

c r (x t ,x t+r ) = /Lt(x t )/i(x t+T ) 

+C~ K2 V(x t )?/>2 (X t )n(x t+T )lp2 (X t+T ) 

+e~ K3 V(x t )p Tifast (x t ,x t+r ) (5) 

with 



•,fast(x t ,x t+r ) = J^e (Ki K3)r i/' l (xi)/i(x t+T )?/' l (x 



t+rj 



i=3 



(6) 

subsuming the fast processes. 

Exact rate: The exact rate of interest k 2 can be re- 
covered as follows: If we know the exact corresponding 
eigenfunction ip2 (x), it follows from Eq. ([IJ and |3| that 
its autocorrelation function evaluates to: 

Aa(r) = (M0)Mr)) 

= jj dx t dx t+T C T (X t , X t+T ) ^2 (x t ) 02 (x t +r ) 

= e" K2T (7) 

where (•) denotes the time average, that is here identical 
to the ensemble average due to ergodicity of the dynam- 
ics. (ip2 (0)^2 (t)) yields the exact eigenvalue A2(r) and 
thus also an exact rate estimate of k 2 = — lnA 2 (T) = 
«2) independently of the choice of r. 

Projected dynamics: We may only observe a (possi- 
bly poor) order parameter y(x) : f2 — > K upon which 
the high-dimensional dynamics is projected. When his- 
togramming the observed dynamics in y, this histogram 
will, for sufficient statistics, approximate the stationary 
probability density projected onto y: 



dx5{y' - j/(x))y«(x) 



(8) 



Since the observer has only information about the dy- 
namics in y available, the eigenfunction ip2 is not acces- 
sible as it is defined in the whole state space f2. Thus, it 
is not possible to calculate the autocorrelation function 
of ip 2 exactly and Eq. ([7]) cannot be used to estimate the 
rate k 2 . At best, we have an approximate model function 
ip2{y ( x ))j which we require to be normalized by 



(9) 



i{>2 approximates the true eigenfunction ■02 that is a func- 
tion of full phase space x only as a function of y: 



^2(y(x)) = (-02, 02^02 + ^2(i>2,ipi)fi'>Pi 



i>2 



/cr0 2 (x) + e(x), 



(10) 



where the true eigenfunction ■02 enters with coeffi- 
cient y/a — (02, 02) ii while the remaining eigcnfunc- 
tions contribute to the error e(x). The autocorrela- 
tion function of ?/> 2 , which we denote as (0 2 (O)0 2 (r)) = 
(^a(y(xt))V'2(j/(xt +r ))), now evaluates to [TS] 



A 2 (r) = (&(0)&(r)> 



ae 



(11) 



i>2 



where 



(^2,-02) 



This autocorrelation function does not yield the exact 
eigenvalue A 2 (t), but some approximation A2(t). For 
t k^ 1 , which can readily be achieved for clear two- 
state processes where k 2 3> W3, the sum on the right 
hand side disappears: 



A 2 (r) 



ae 



(12) 



This suggests that the true rate, as well as a measure of 
reaction coordinate quality, could be recovered from large 
tau decay of an appropriately good trial function even 
from the projected process. We elaborate this concept in 
subsequent sections. In experiments, the relaxation rates 
K2,^3, etc, are initially unknown and hence the validity 
of Eq. ( 12 1 can only be checked a posteriori, e.g. by the 



fact that estimates based upon Eq. ( 12 1 are independent 
of the lag time r. 



Existing rate estimators 

Many commonly used rate estimators consist of two 
steps: (1) they (explicitly or implicitly) calculate an au- 
tocorrelation function A2 (r) of some function 2 , and (2) 
transform A2(r) into a rate estimate « 2 . In order to de- 
rive an optimal estimator, it is important to understand 
how the systematic error of the estimated rate on each 
of the two steps. Please refer to the SI for a derivation 
of the subsequent results. 

Many rate estimators operate by defining a single di- 
viding surface which splits the state space into reactants 
A and products B. Calling h A (y) the indicator function 
which is 1 for set A and for set B, one may define the 
normalized fluctuation autocorrelation function of state 
A E2 



A2W 



(h A (0)h A (r)) - (h A ) 



(MO)Mr)) (13) 



that can also be interpreted as an autocorrelation func- 
tion \2(t) with a step function ip2, divide (y) — (hA(y) — 
7T A)/y /7r A 7r B- Here, tt a = {h A )^ is the stationary proba- 
bility of state A and txb — ^~^a the stationary probabil- 
ity of state B. Other rate estimates choose 2 to be the 



4 



signal y(t) itself or the committor function between two 
pre-defined subsets of the y coordinate |2H]. We show 
that none of these choices is optimal, and the optimal 
choice of tp 2 will be derived in the subsequent section. 

Existing rate estimators largely differ by step (2), i.e. 
how they transform \2(t) into a rate estimate k 2 . This 
procedure then determines the functional form of the sys- 
tematic estimation error. We subsequently list bounds 
for these errors (see SI for derivation). 

Reactive flux rate. Chandler, Montgomery and 
Berne [3j[T7] considered the reactive flux correlation func- 
tion as a rate estimator: k 2 ^(t) — — J^A 2 (r). Its error 
is 



i>2 



(14) 



which becomes for a perfect reaction coordinate (i.e. a 
perfect choice of ip 2 = '02 that leads to a = 1) but can 
be very large otherwise. 

Transition state theory rate. The transition state the- 
ory rate, which measures the instantaneous flux across 
the dividing surface between A and B, can be com- 
puted from the short-time limit of the reactive flux [3J, 
«2,tst = lim r ^o+ ^2,rf (t) such that the error in the is 
given by 

«2,tst - K 2 = K 2 (a - 1) + /]{ij>2, j>i) 2 u K2 > &2,rf ~ K 2 



i>2 



(15) 



which is always an overestimate of the true rate and the 
reaction flux rate. 

Integrating the correlation function. Another means of 
estimating the rate is via the integral of the correlation 

function, K 2 ,mt = - dr A 2 (r)J (see, e.g., Equa- 
tion 3.6 of [3J), with the error: 



1 



K 2 ,int ^ «2 



K2 



2 K2 
M Ki 



+ E l > 2 (V'i ! V'2) 



2 «2 



in the special case that K3 3> k 2 (time scale separation), 
the error is approximately given by k 2 (1 — a) /a. Thus, 
the error of this estimator becomes zero for the ideal case 
of an ideal reaction coordiante (a = l), but may be very 
huge if the reaction coordinate is poor (a < 1). 

Single-r rate estimators: A simple rate estimator is 
to take the value of the autocorrelation function of some 
function ip 2 a t a single value of r, and transform it into 
a rate estimate by virtue of Eq. (12 1. We call these 
estimators single-r estimators. Ignoring statistical un- 
certainties, they yield a rate estimate of the form 



^2, single 



lnA 2 (r) 



(16) 



Quantitatively, the error can be bounded by the expres 
sion (see Supplementary Information): 

In a 

K 2, single — K 2 



< - 



(17) 



The error becomes identical to this bound for systems 
with a strong timescale separation, k 3 3> k 2 . Eq. (17 1 
decays relatively slowly in time (with r -1 . See Fig. [l| 
for a two-state example). It will be shown below that 
methods that estimate rates from counting the number 
of transitions across a dividing surface, such as Markov 
state models, are single-r estimators and are thus subject 
to the error given by Eq. (17 1. 



The systematic error of single-r estimators results from 
the fact that Eq. ( [l6| effectively attempts to fit the tail 
of a multiexponential decay A 2 (r) by a single-exponential 
with the constraint A 2 (0) = 1. Unfortunately, the abil- 
ity to improve these estimators by simply increasing r is 
limited because the statistical uncertainty of estimating 



Eq. 12 quickly grows in r |33| . 

Multi-r rate estimators: To avoid the error given by 
Eq. ( 17 ) , it is advisable to estimate the rate by evaluating 



the autocorrelation function A 2 (r) at multiple values of 
r. This can be done e.g. by performing an exponential 
fit to the tail of the A 2 (r) , thus avoiding the constraint 
A 2 (0) = 1 |29l l3"2"] . The estimation error k 2 — k 2 has the 
form 

«2, mu it - K2 < c— e - Tl ^- K2 > (18) 



where t\ is the first lag time from the series (r 1; ...,r m ) 
used for fitting, and the constant c also depends on the 
lag times and the fitting algorithm used. The Supple- 
mentary Information shows that for several fitting al- 
gorithms, such as a least-squares procedure at the time 
points (r, 2r, bit), c is such that 



K 2,mult Sl K 2,single- 



(19) 



Thus, the multi-r estimator always is always better than 
the single-r estimator (see SI). The main advantage of 
multi-r estimators is that their convergence rate is ex- 
ponential in r when the time scale separation K3 — k 2 is 
not vanishing (compare to Eq. 17 1. Thus, multi-r es- 



timators are better the larger the timescale separation 
between the slowest and the other relaxation rates in the 
system is. Note that the systematic error of a multi-r 
estimate thus decays much faster in r than its statistical 
error that decays by r _1 |33| . 

In the absence of statistical error, all of the above rate 
estimation methods yield an overestimation of the rate, 
k 2 > k 2 . 



Best reaction coordinate - optimal choice of ifa 

It was shown above that multi-r estimators are the 
best choice for converting an autocorrelation function 
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into a rate estimate. However, what is the best possible 
choice -02 = "02,optimai given that only a specified order 
parameter y is observable? In other words, which func- 
tion should the observed dynamics be projected upon in 
orde r to obtain an optimal rate estimator? Following Eq. 
( 16 1 , the optimal choice 2 is the one which maximizes 
the parameter a, as this will minimize the systematic er- 
ror from a direct rate estimation by virtue of Eq. (17 1, 



The pre-factor a in Eq. (12 1 is a measure between 



and 1 quantifying how good a reacti on c oordi nat e the 
model 02 (y) is, and by virtue of Eqs. (17 1 and (18 I how 



and also minimize the systematic error involved in esti- 
mating k 2 from an exponential fit to Eq. (12). We are 
thus seeking the solution of: 

02 = arg max a = arg max A2 (r) (20) 
subject to the normalization Eq. (|9|). Here, arg max 



denotes the function that maximizes a over the space 
of functions 02 (y). For a sufficiently large set of n 
basis functions, j(y) — (71 (y), 7n(y)), the optimal 
eigenfunction approximant 02 is approximated by a lin- 
ear combination 02 (y) ~ SILi c ili{y) with coefficients 
c = (ci, c n ). When *y(y) is chosen to be an orthogonal 
basis set, then 2 = argmax^ 2 a can be approximated by 
the Ritz method [19, 25 . An easy way to do this approx- 
imation in practice is to perform a fine discretizion of y 
by histogram windows. Using a binning with bin bound- 
aries yx, y n +ii and the corresponding indicator func- 
tions 7i = h[ yit y i+1 ], then the above optimization problem 
is solved by estimating the transition probability matrix 
with elements 

Tij = P(y(r) e [ yj ,y j+ i] I y(0) e [yi,y i+ i]) (21) 
and calculating c as the second eigenvector: 



Tc = A9C 



(22) 



where A2 < 1 is the second-largest eigenvalue of T. Note 
that a given optimal 02 (y) can still be used with single-r 
and multi-r rate estimators that would produce different 
estimates for K2- 



Reaction coordinate quality (RCQ) 

Evaluating how well a given putative reaction coor- 
dinate captures complex dynamical behavior is of great 
general interest. Previous studies have proposed ways to 
measure the reaction coordinate quality (RCQ) that are 
based on comparing the observed dynamics to specific 
dynamical models or testing the ability of the observable 
to model the committor or splitting probability between 
two chosen end-states A and B |23J. These metrics are 
either only valid for specific models of dynamics or them- 
selves require a sufficiently good separation of A and B 
by definition, restricting their applicability to observables 
with rather good RCQs. 



large the error in our rate estimate can be. The RCQ 
a can be estimated by an e xponential fit to the autocor- 
relation function Eq. ( 12 1 of the model function 0g(y) 
employed. For the optimal choice 2 — 02 (Eq. (20)), 
we denote this prefactor a, where a = a(02) > c*(02), 
i.e. a is the best possible (maximal) RCQ of the observ- 
able used. Thus, we propose a as a general measure for 
the quality of the reaction coordinate. This RCQ is as 
general as possible, as it makes no assumptions about 
the class of dynamics in the observed coordinate, and 
does not depend on any subjective choices such as the 
choice of two reaction end-states A and B in terms of 
the observable y. Through the derivation above it has 
also been shown that a measures the fraction of ampli- 
tude by which the slowest process is observable, which 
is exactly the property one would expect from a mea- 
sure of the RCQ: a is 1 for a perfect reaction coordinate 
and if the slowest process is exactly orthogonal to the 
observable. 

Finally, a can be determined by the spectral estima- 
tion procedure described below. Figures [T] and [2] show 
estimates of the RCQ a of different projected dynamics 
using different rate estimators. 



Markov (state) models (MSMs) 

MSMs have recently gained popularity in the model- 
ing of stochastic dynamics from molecular simulations 
(H EJ EH (30j E]. MSMs can be understood as a way 
of implicitly performing rate estimates via discretizing 
state space into small substates. Let us consider a MSM 
obtained by finely discretizing the observed space y into 
bins and estimating a transition matrix T(r) amongst 
these bins. We have seen that this procedure approx- 
imately solves the optimization problem of Eq. ( 20 1 



and the leading eigenvector of T(r) approximates the 
best possible reaction coordinate, 02 (y), available for the 
given observable y. Ref J3DJ has suggested to use the im- 
plied timescale i 2 = — r/ln(A2(r)) as an estimate for the 
system's slowest relaxation timescale, and at the same 
time for a test which choice of r leads to a MSM with 
a small approximation error. These implied timescales 
correspond to the inverse relaxation rates and therefore 



the MSM rate estimate is described by Eq. ( 16 1 with the 
choice 02 = 02 ■ A sufficiently fine MSM thus serves as an 
optimal single-r rate estimator as it uses the maximum 
RCQ a for observables that are being discretized. How- 
ever, when these observables have a poor RCQ a since 
they are poorly separating the slowly-converting states, 
there is a substantial rate estimator error according to 
Eq. (17) that decays slowly with t -1 . This explains the 
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slow convergence of implied timescales shown in recent 
MSM studies [H H HH HI HI] • 



Spectral estimation 

The optimal estimator for n 2 is thus one that fits the 
exponential decay of k 2 while minimizing the fitting er- 
ror Eq. (18 1. As analyzed above, the systematic fitting 
error is minimized by any multi-r estimator. In order 
to obtain a numerically robust fit, especially in the case 
when statistical noise is present, it is optimal to fit to an 
autocorrelation function A2 (t) where the relevant slowest 
decay has maximum amplitude a. This is approximately 
achieved by constructing a fine discretization MSM on 
the observed coordinate (see Section "Best reaction coor- 
dinate"). Thus, the optimal estimator of k 2 proceeds as 
follows: 

1 . Obtain a fine discretization of the observed coordi- 
nate y into n bins, say [yi,yi+i] for i 6 l...n. 

2. Construct a row-stochastic transition matrix T(t) 
for different values of r. The estimation of tran- 
sition matrices from data have been described in 
detail [24 . A simple way of estimating T(r) is the 
following: (i) for all pairs i,j of bins, let Cij(r) be 
the number of times the trajectory has been in bin 
i at time t and in bin j at time t + r, summed 
over all time origins t; (ii) estimate the elements of 
T(t) by Ty(r) = Cij{T)/Y,k c ik( T )- A numerically 
superior approach is to use a reversible transition 
matrix estimator . 

3. Calculate the discrete stationary probability fj, and 
the discrete eigenvector xp 2 by solving the eigen- 
value equations: 



T(r)^ 2 



(23) 
(24) 



with the largest eigenvalues Ai = 1 < A2 < A3. T T 
denotes the transpose of the transition matrix. The 
ith element of the vectors /j, and ip 2 approximate 
the stationary density fi(y) and ip 2 on the respective 



point y 



Vi+Vi 



Functions fi(y) and ip 2 (y) can 



be obtained by some interpolation method. 

4. Estimate k 2 and a via an exponential fit of ae~ K ' lT 
to the tail of A 2 (r) = ($a(t)$a(t + r)) t . 

Note that this estimator is optimal in terms of minimizing 
the systematic error. When dealing with real data, the 
amount of statistics may set restrictions of how fine a 
discretization is suitable and how large a lag time tau will 
yield reasonable signal to noise. This issue is discussed 
in various other contributions 1331. 



In order to partition the estimated relaxation rate k 2 
to microscopic transition rates Icab, ^ba (for the A — > B 
and B — > A transitions, respectively), we need to define 
states A and B, i.e. split state space into two subsets A 
and B. We have seen above that any choice of a single di- 
viding surface will deteriorate the eigenfunction ip 2 and 
thus be a suboptimal choice for A and B. PCCA the- 
ory [14] tells us that there is a way of splitting state 
space into substates and at the same time maintain op- 
timal approximations to the exact eigenfunctions (here 
ip 2 ): The state assigment must be fuzzy, i.e. instead of 
choosing dividing surfaces that uniquely assign points x 
to either A or B, we have fuzzy membership functions 
Xa{x) and Xb{x) with the property xa{x) + Xb(x) = 1. 
These membership functions can be calculated after ip 2 is 
known. Following the derivation given in the appendix, 
they turn out to be: 



maxt/> 2 - 4> 2 t 
max xp 2 — mm ip 2 

^2,1 ~ nrint/> 2 

Xba — ~. ; ^~ 

max xp 2 — mm ip 2 



(25) 
(26) 



Since we are restricted to the projected eigenfunc- 
tion ?/>2, we can determine the optimal choice xa(x) and 
Xb(x) from ip 2 (x). Together with the estimated station- 
ary density fx(x) which can e.g. be obtained by comput- 
ing a histogram from sufficiently long equilibrium trajec- 
tories, the probability density of being in A and B when 
projected onto the observable y can be calulated: 



fJ-A,. 



XA,iVi 
XB.iVi 



(27) 
(28) 



The probability of being in A and B is thus given by: 

*A = Ys»A,i (29) 

i 

n B = Vb,i = 1 - KA (30) 

i 

and these probabilities can be used to split k 2 into 
microscopic transition rates ItAB and Uba- 



kAB = TT B K 2 
k-BA = 7TA^2 



(31) 
(32) 



Note that the assignment of labels "A" and "£?" to parts 



of state space is arbitrary. Eq. (311 is the transition rate 



from A to B as defined by Eqs. (25l-(26l, and Eq. (32 1 



is the corresponding transition rate from B to A. 

The results of spectral estimation are presented for a 
two-dimensional diffusion model projected onto different 
order parameters (Fig. [TJ and compared to a multi-r 
estimator using a dividing surface and a single-r MSM 
estimate. In contrast to the other estimators, the spectral 
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Figure 1: Illustration of the reaction coordinate quality (RCQ) a and different rate estimates using overdamped Langevin 
dynamics in a two-dimensional two-well potential that is projected onto different observables: perfect projection (first row), 
intermediate-quality projection (second row), poor projection (third row). For the multi-r estimators, the lag time r specifies 
the start of a time range [r, r + 10] used for an exponential fit. Column 1: Stationary density of the dynamical model, showing 
the two populated states in two dimensions and the projected stationary density in the back plane. In the poor projection, both 
states are indistinguishable in the observed density. Column 2: Estimated RCQ factor a from spectral estimation (red), and 
from exponential fitting to the number correlation function using a diving surface at y — (blue). With increasing overlap of 
the slowly-converting states, the RCQ a decreases and the direct MSM rate estimate deteriorates. While the spectral estimator 
yields an approximation to the true RCQ a, the a from dividing surfaces is significantly lower in rows 2 and 3. Column 3: 
Estimated relaxation rate k,2 using the second eigenvalue of a transition matrix at r (solid black), using an exponential fit to 
correlation function at the dividing surface at y — (blue), and spectral estimation (red). The black dotted line is the reference 
solution, obtained from a direct MSM estimate for r = 50 in row 1. Column 4: The transition rates Uab (dashed) and Uba 
(solid) calculated using the dividing surface estimate (blue) and from the partial densities of the spectral estimator (red). 



estimator yields results that have no or little systematic 
bias even for very poor reaction coordinates. Moreover, 
the statistical fluctuations of the spectral estimator are 
smaller than for the dividing surface estimator. Fig. [2] 
shows an application of the spectral estimator to real 
experimental data obtain from an optical tweezer experi- 
ment that probes the end-to-end distance fluctuations of 
the p5ab RNA hairpin. The RCQ a estimates show that 
the reaction coordinate used here is very good. Yet, the 
spectral estimation yields results for the transition rates 
kAB and Uba that are slightly different from the dividing 
surface estimates. 

An overview of the relative performances of three es- 
timators is given in Table |TJ Markov model, multi-r fit 
on a dividing surface-based number correlation function 
and spectral estimator. This scheme illustrates how these 
different estimators operate and that the spectral es- 
timator combines ideas from both Markov models and 
exponential-fitting estimators in order to combine their 



strenghts. 

Conclusions 

We have described a rate theory for ergodic and re- 
versible dynamical systems for which only an order pa- 
rameter that is potentially a poor reaction coordinate is 
available. Based on the theory, the performance of ex- 
isting rate estimators could be quantified and compared. 
For example, the relatively large systematic estimation 
error in the implied timescales / implied rates of Markov 
state models is explained. Following the theory, an op- 
timal estimator is proposed that we refer to as spectral 
estimator. The spectral estimator is new in that it pro- 
vides optimal estimates for three types of quantities: 

1. The reaction coordinate quality (RCQ) a, which is 
independent of any specific dynamical model and 
also does not need the definition of an "A" or "B" 
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Figure 2: Estimates for rates and the reaction coordinate quality (RCQ) from equilibrium single-molecule force probe exper- 
iments on an RNA hairpin. The equilibrium end-to-end fluctuation of the RNA were probed by an optical tweezer [BJ. For 
the multi-r estimators, the lag time r specifies the start of a time range [r, r+2.5 ms] that was used for an exponential fit. 
Three estimates are compared: The direct MSM estimate (black), a fit to the fluctuation autocorrelation function using a 
dividing surface at the histogram minimum y = 12.75 pN (blue), and spectral estimation (red), (a) schematic experimental 
setup, (b) the stationary probability of observing a given force value (solid black line). This probability can also be estimated 
from a by histogramming the measurement values. The partial probabilities of states A (orange) and B (green) obtained by 
spectral estimation show that there is a small overlap between the states. Notably, part of state A overlaps into state B. (c) 
the reaction coordinate quality (RCQ) a. The spectral estimator estimates the optimal value to be a ~ 0.96 at r = 5 ms while 
the best possible dividing surface results in a « 0.94 at r = 5 ms. (d) estimated relaxation rate «2- The MSM estimate is 
strongly biased and only approaches the true value after r = 20 ms. Both estimators based on exponential fits of the tail of 
the autocorrelation function (spectral estimator, dividing surface) agree on a rate of ~ 58 s _1 . (e) estimated microscopic 
transition rates fc^s (dotted) and ksA (dashed). The dividing surface estimator obtains rates that differ by 10-20% from the 
spectral estimate since it cannot correctly assign the probability of state B that is on the "A" side of the dividing surface. 



state and bounds the error in rates estimated from 
a given reaction coordinate. 

2. The dominant relaxation rate /c 2 , as well as the 
microscopic transition rates Izab and fc^i, even if 
A and B strongly overlap in the observable. 

3. The local probability densities, and hence projec- 
tions of the states A and B in the observable, ha{v) 
and /zs(y), as well as their total probabilities, it a 
and ttb- This information is also obtained if A and 
B strongly overlap in the observable. 

Other rate estimators that rely on fitting the exponen- 
tial tail of a time-correlation function calculated from the 
experimental recorded trajectories can also estimate K2 
without systematic error. However, the spectral estima- 
tor is unique in also being able to estimate fc^s, ksA, 
Ha{v)i Hb(u) and the RCQ in the presence of states that 
overlap in the observable order parameter. 

The present study has concentrated on systematic 
rate estimation errors that are expected in the data-rich 
regime. We expect that taking the statistical error into 
consideration will make the spectral estimator described 
here even more preferrable over more direct approaches 
such as fitting the number autocorrelation function of 
a dividing surface. This intuition comes from the fact 
that the spectral estimator maximizes the amplitude a 
with which the slow relaxation of interest is involved in 
the autocorrelation function. In the presence of statisti- 



cal noise, this will effectively maximize the signal-to-noise 
ratio and thus lead to an advantage over fitting autocorre- 
lation functions that were obtained differently. However, 
a systematic treatment of both statistical and systematic 
error should be made but goes beyond the present study. 

The present idea of building an optimal estimator for a 
single relaxation rate upon the transition matrix estimate 
of the projected slowest eigenfunction, ip2, is readily ex- 
tensible to multiple relaxation rates. This lays the basis 
for a multi-state rate theory for low-dimensional observa- 
tions of dynamical systems that will be pursued in future 
studies. 
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(A) Good reaction coordinate 



Observable stationary 
distribution (histogram-density) 



Eigenfunction approximant ip2{y) 



Autocorrelation function 
A 2 (t) = {^2{yt)^2{yt+r))t (black) 
and rate fitting function (green) 
on log scale. 



Dissection of observable 
stationary distribution into 
partial populations of substates 

Estimation results 
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correct 
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correct 



(B) Poor reaction coordinate 



Observable stationary 
distribution (histogram-density) 



Eigenfunction approximant xp2 (y) 



Autocorrelation function 
A 2 (t) = {^2{yt)^2{yt+r))t (black) 
and rate fitting function (green) 
on log scale. 



Dissection of observable 
stationary distribution into 
partial populations of substates 

Estimation results 



i4v) 

A 




Relaxation rate K2 


wrong 


correct (but numerically 
unstable) 


correct 


Reaction coordinate quality a 


N/A 


wrong 


correct 


Substate probabilities tva, kb 


approximate 


wrong 


approximate 


Microscopic rates Icab, &ba 


wrong 


wrong 


approximate 



Table I: Scheme comparing spectral estimator (right column) with Markov models constructed from a fine binning along the 
observable y-coordinate (left column) and an exponential fit to the number correlation function obtained from a dividing surface 
definition (middle column). Two two-state systems are compared: (A) observed on a nearly perfect reaction coordinate where all 
three estimators yield the correct rates, yet the spectral estimator is the only one that provides the correct reaction coordinate 
quality and microscopic rates. (B) observed on a poor reaction coordinate, in which the two substates strongly overlap. Only 
the spectral estimator can retrieve the correct rates and substate populations. 
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